Informative Variables Selection for Multi-relational Supervised Learning

نویسندگان

Dhafer Lahbib

Marc Boullé

Dominique Laurent

چکیده

In multi-relational data mining, data are represented in a relational form where the individuals of the target table are potentially related to several records in secondary tables in one-to-many relationship. To cope with this one-to-many setting, most of the existing approaches try to transform the multi-table representation, namely by propositionalisation, thereby losing the naturally compact initial representation and eventually introducing statistical bias. Our approach aims to directly evaluate the informativness of the original input variables over the relational domain w.r.t. the target variable. The idea is to summarize for each individual the information contained in the non target table variable by a features tuple representing the cardinalities of the initial modalities. Multivariate grid models have been used to qualify the joint information brought by the new features, which is equivalent to estimate the conditional density of the target variable given the input variable in non target table. Preliminary experiments on artificial and real data sets show that the approach allows to potentially identify relevant one-tomany variables. In this article, we focus on binary variables because of space constraints.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Itemset-Based Variable Construction in Multi-relational Supervised Learning

In multi-relational data mining, data are represented in a relational form where the individuals of the target table are potentially related to several records in secondary tables in one-to-many relationship. In this paper, we introduce an itemset based framework for constructing variables in secondary tables and evaluating their conditional information for the supervised classification task. W...

متن کامل

Exploring the Gap Between Variable Selection and Dimensionality Reduction

The Problem: This project addresses the gap between variable selection algorithms and dimensionality reduction algorithms. Variable selection algorithms are designed to produce sparse solutions where only few variable are marked as relevant variables. This is not suitable for highly correlated data such as gray values of an image. Dimensionality reduction algorithms (e.g PCA) tend to combine al...

متن کامل

Locally Consistent Bayesian Network Scores for Multi-Relational Data

An important task for relational learning is Bayesian network (BN) structure learning. A fundamental component of structure learning is a model selection score that measures how well a model fits a dataset. We describe a new method that upgrades for multi-relational databases, a loglinear BN score designed for single-table i.i.d. data. Chickering and Meek showed that for i.i.d. data, standard B...

متن کامل

Towards Automatic Feature Construction for Supervised Classification

We suggest an approach to automate variable construction for supervised learning, especially in the multi-relational setting. Domain knowledge is specified by describing the structure of data by the means of variables, tables and links across tables, and choosing construction rules. The space of variables that can be constructed is virtually infinite, which raises both combinatorial and over-fi...

متن کامل

Evolutionary Approaches to the Learning of Fuzzy Rule- Based Classification Systems

The learning of a Fuzzy Rule-Based Classification System (FRBCS) by means of a supervised inductive process fundamentally implies four tasks that are complementary among them: the selection of the most informative variables to the classification problem to solve, the generation of a set of rules, the selection of the subset of rules with the best co-operation and the least redundancy, and the e...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Informative Variables Selection for Multi-relational Supervised Learning

نویسندگان

چکیده

منابع مشابه

Itemset-Based Variable Construction in Multi-relational Supervised Learning

Exploring the Gap Between Variable Selection and Dimensionality Reduction

Locally Consistent Bayesian Network Scores for Multi-Relational Data

Towards Automatic Feature Construction for Supervised Classification

Evolutionary Approaches to the Learning of Fuzzy Rule- Based Classification Systems

عنوان ژورنال:

اشتراک گذاری